# INT4 Quantization
Mistral Small 3.1 24B Instruct 2503 Quantized.w4a16
Apache-2.0
This is an INT4-quantized Mistral-Small-3.1-24B-Instruct-2503 model, optimized and released by Red Hat (Neural Magic), suitable for fast-response dialogue agents and low-latency inference scenarios.
Text-to-Image
Safetensors Supports Multiple Languages
M
RedHatAI
219
1
Gemma 3 4b It GPTQ 4b 128g
INT4 quantized version based on the gemma-3-4b-it model, significantly reducing storage and computational resource requirements
Image-to-Text
Transformers

G
ISTA-DASLab
502
2
Whisper Large V3.w4a16
Apache-2.0
This is the quantized version of openai/whisper-large-v3, employing INT4 weight quantization and FP16 activation quantization, suitable for vLLM inference.
Speech Recognition
Transformers English

W
nm-testing
20
1
Svdq Int4 Flux.1 Depth Dev
Other
INT4 quantized version of FLUX.1-Depth-dev, capable of generating images from text descriptions while adhering to the structure of the input image. Compared to the original BF16 model, this version saves approximately 4x memory and improves runtime speed by 2-3x.
Image Generation English
S
mit-han-lab
9,085
3
FLUX.1 Dev Qint4
Other
FLUX.1-dev is a text-to-image generation model quantized to INT4 format using Optimum Quanto, suitable for non-commercial use.
Text-to-Image English
F
Disty0
455
12
Meta Llama 3.1 8B Instruct Quantized.w4a16
A quantized version of Meta-Llama-3.1-8B-Instruct, optimized to reduce disk space and GPU memory requirements, suitable for chat assistant scenarios in English business and research.
Large Language Model
Transformers Supports Multiple Languages

M
RedHatAI
27.51k
28
Meta Llama 3.1 70B Instruct AWQ INT4
INT4 quantized version of Llama 3.1 70B Instruct, optimized with AutoAWQ technology, suitable for multilingual dialogue scenarios.
Large Language Model
Transformers Supports Multiple Languages

M
hugging-quants
80.59k
100
Meta Llama 3.1 8B Instruct AWQ INT4
INT4 quantized version of Llama 3.1 8B Instruct, quantized using AutoAWQ tool, suitable for multilingual dialogue scenarios.
Large Language Model
Transformers Supports Multiple Languages

M
hugging-quants
348.23k
67
Whisper Large Onnx Int4 Inc
Apache-2.0
Whisper is a pre-trained model for automatic speech recognition (ASR) and speech translation. This repository provides the Whisper large model in ONNX format with INT4 weight quantization, powered by Intel® Neural Compressor and Intel® Transformers Extension.
Speech Recognition
Transformers

W
Intel
44
8
Featured Recommended AI Models